Supplement of “ Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization”
نویسندگان
چکیده
II More Experiments We present more experimental results that are not included in the main paper in this section. We consider the same experiment environment, and the same problem being solved. We present the results using different values of C to see the relative efficiency when the problems become more difficult or easier. The result of C = 10−3 is shown in Figure (I), and the result of C = 1000 is shown in Figure (II). For C = 10−3, the problems are easier to solve. We observe that L-CommDir-BFGS is faster than existing methods on all data sets, and L-CommDir-Step outperforms state of the art on all data sets but url, but is still competitive on url. For C = 1000, L-CommDir-BFGS is the fastest on most data sets, and the only exception is KDD2010-b, on which L-CommDir-BFGS is slightly slower than L-CommDirStep, but faster than other methods. On the other hand, L-CommDir-Step is slower than TRON on webspam but faster than existing methods on other data sets. These results show that our method is highly efficient by using (2.9) or (2.10) to decide Pk. The other choice, L-CommDir-Grad, is obviously inferior for most cases. We also modify from TRON to obtain a line-search truncated Newton solver to compare with our method. This line-search truncated Newton method is denoted by NEWTON in the results in Figures (III)-(V). Results show that NEWTON is consistently the fastest on criteo for all choices of C, and outperforms L-CommDir-Step in the later stage on url for all C, and on epsilon for C = 1000, but LCommDir-BFGS and L-CommDir-Step are faster on all other cases. Therefore, in most cases, our method is the most efficient one.
منابع مشابه
Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization
Distributed optimization has become an important research topic for dealing with extremely large volume of data available in the Internet companies nowadays. Additional machines make computation less expensive, but inter-machine communication becomes prominent in the optimization process, and efficient optimization methods should reduce the amount of the communication in order to achieve shorte...
متن کاملCommunication-Efficient Distributed Optimization of Self-Concordant Empirical Loss
We consider distributed convex optimization problems originated from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning. We assume that each machine in the distributed computing system has access to a local empirical loss function, constructed with i.i.d. data sampled from a common distribution. We propose a communication-efficient distri...
متن کاملA Dual Augmented Block Minimization Framework for Learning with Limited Memory
In past few years, several techniques have been proposed for training of linear Support Vector Machine (SVM) in limited-memory setting, where a dual blockcoordinate descent (dual-BCD) method was used to balance cost spent on I/O and computation. In this paper, we consider the more general setting of regularized Empirical Risk Minimization (ERM) when data cannot fit into memory. In particular, w...
متن کاملLimited-memory Common-directions Method for Distributed L1-regularized Linear Classification
For distributed linear classification, L1 regularization is useful because of a smaller model size. However, with the non-differentiability, it is more difficult to develop efficient optimization algorithms. In the past decade, OWLQN has emerged as the major method for distributed training of L1 problems. In this work, we point out issues in OWLQN’s search directions. Then we extend the recentl...
متن کاملDistributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization
Designing distributed algorithms for empirical risk minimization (ERM) has become an active research topic in recent years because of the practical need to deal with the huge volume of data. In this paper, we propose a general framework for training an ERM model via solving its dual problem in parallel over multiple machines. Our method provides a versatile approach for many large-scale machine...
متن کامل